Geothermal machine learning analysis: Brady site, NV

This notebook is a part of the GTcloud.jl: GeoThermal Cloud for Machine Learning.

geothermalcloud

Machine learning analyses are performed using the SmartTensors machine learning framework.

SmartTensors

This notebook demonstrates how the NMFk module of SmartTensors can be applied to perform unsupervised geothermal machine-learning analyses.

nmfk

More information how the ML results are interpreted to provide geothermal insights is discussed in our research paper.

Import required Julia modules

If NMFk is not installed, first execute in the Julia REPL import Pkg; Pkg.add("NMFk"); Pkg.add("DelimitedFiles"); Pkg.add("JLD"); Pkg.add("Gadfly"); Pkg.add("Cairo"); Pkg.add("Fontconfig"); Pkg.add("Mads").

Load and pre-process the dataset

Setup the working directory containing the Brady site data

Load the data file

Populate the missing well names

Set up missing entries to be equal to zero

Define names of the data attributes (matrix columns)

Short attribute names are used for coding.

Long attribute names are used for plotting and visualization.

Define the attributes that will be processed

Index the attributes that will be processed

Display information about the processed data (min, max, count):

Get well locations and production

Define well types

Display information about processed well attributes

Collect the well data into a 3D tensor

Tensor indices (dimensions) define depths, attributes, and wells.

Define the maximum depth

The maximum depth limits the depth of the data included in the analyses.

The maximum depth is set to 750 m.

Normalize tensor slices associated with each attribute

Define problem setup variables

Plot well data

A HTML file named ../map/dataset-set00-v9-inv.html is generated mapping the site data.

The map provides interacive visualization of the site data (it can be also openned with any browswer).

The map shows the location of the Dry, Injection and Production wells.

Perform ML analyses

For the ML analyses, the data tensor can be flatten into a data matrix by using two different approaches:

After that the NMFk algorithm will factorize the data matrix X into W and H matrices. For more information, check out the NMFk website

Type 1 flattening: Focus on well locations

Flatten the tensor into a matrix

Matrix rows merge the depth and attribute dimensions.

Matrix cols represent the well locations.

Perform NMFk analyses

Here, the NMFk results are loaded from a prior ML runs.

As seen from the output above, the NMFk analyses identified that the optimal number of geothermal signatures in the dataset 6.

Solutions with a number of signatures less than 6 are underfitting.

Solutions with a number of signatures greater than 6 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2, 5 and 6 signatures.

Post-process NMFk results

Number of signatures

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The plot above also demonstrates that the accceptable solutions contain 2, 5 and 6 signatures.

Analysis of all the accceptable solutions

The ML solutions containing an acceptable number of signatures are further analyzed as follows:

The results for a solution with 6 signatures presented above will be further discussed here.

The well attributes are clustered into 6 groups:

This grouping is based on analyses of the attribute matrix W:

attributes-6-labeled-sorted

Note that the attribute matrix W is automatically modified to account that vertical depths are applied in charecterizing the analyzed datasets.

The well locations are also clustered into 6 groups:

This grouping is based on analyses of the location matrix H:

locations-6-labeled-sorted

The map ../figures-set00-v9-inv-750-1000-daln/locations-6-map.html provides interacive visualization of the extracted well location groups (the html file can be also openned with any browswer).

More information how the ML results are interpreted to provide geothermal insights is discussed in our research paper.

Type 2 flattening: Focus on well attributes

Flatten the tensor into a matrix

Matrix rows merge the depth and well locations dimensions.

Matrix cols represent the well attributes.

Perform NMFk analyses

Here the NMFk results are loaded from a prior ML runs.

As seen from the output above, the NMFk analyses identified that the optimal number of geothermal signatures in the dataset 3.

Solutions with a number of signatures less than 3 are underfitting.

Solutions with a number of signatures greater than 3 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2 and 3 signatures.

Post-process NMFk results

Number of signatures

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The plot above also demonstrates that the accceptable solutions contain 2 and 3 signatures.

Analysis of all the accceptable solutions

The ML solutions containing an acceptable number of signatures are further analyzed as follows:

Analysis of the 3-signature solution

The results for a solution with 3 signatures presented above will be further discussed here.

The well attributes are clustered into 3 groups:

This grouping is based on analyses of the attribute matrix W:

attributes-3-labeled-sorted

Note that the attribute matrix W is automatically modified to account that vertical depths are applied in charecterizing the analyzed datasets.

The well locations are also clustered into 3 groups:

This grouping is based on analyses of the location matrix H:

locations-3-labeled-sorted

The map ../figures-set00-v9-inv-750-1000-dlan/locations-3-map.html provides interacive visualization of the extracted well location groups (the html file can be also openned with any browswer).